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warning system design and testing 

Michael Liljenstam , David M. Nicol , Vincent H. Berk , Robert S. Gray 
Proceedings of the 2003 ACM workshop on Rapid Malcode October 2003 

Reproducing the effects of large-scale worm attacks in a laboratory setup in a realistic 
and reproducible manner is an important issue for the development of worm 
detection and defense systems. In this paper, we describe a worm simulation model 
we are developing to accurately model the large-scale spread dynamics of a worm 
and many aspects of its detailed effects on the network. We can model slow or fast 
worms with realistic scan rates on realistic IP address spaces and selectively model 
local d ... 



3 Tools/platforms; Tools and techniques for performance measurement of 77% 
£| large distributed multlagent systems 

Aaron Helsinger , Richard Lazarus , William Wright , John Zinky 

Pr ceedings f the sec nd internati nal j int c nference n Aut n m us agents 
and multiagent systems July 2003 
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Performance measurement of large distributed multiagent systems (MAS) offers 
challenges that must be addressed explicitly in the agent infrastructure. Performance 
data is widely distributed and voluminous, and poor data collection can impact the 
operation of the system itself. However, performance metrics are essential to internal 
system function, e.g., autonomous adaptation to dynamic environments, as well as to 
external assessment. In this paper we describe the tools, techniques, and results o ... 

4 Fast detection of communication patterns in distributed executions 770/0 
Thomas Kunz , Michiel F. H. Seuren 

Proceedings of the 1997 conference of the Centre for Advanced Studies on 
Collaborative research November 1997 

Understanding distributed applications is a tedious and difficult task. Visualizations 
based on process-time diagrams are often used to obtain a better understanding of 
the execution of the application. The visualization tool we use is Poet, an event tracer 
developed at the University of Waterloo. However, these diagrams are often very 
complex and do not provide the user with the desired overview of the application. In 
our experience, such tools display repeated occurrences of non-trivial commun ... 

5 Improving the browsing experience: Comparing link marker visualization 77% 
0j) techniques: changes in reading behavior 

Hartmut Obendorf , Harald Weinreich 

Proceedings of the twelfth international conference on World Wide Web May 2003 
Links are one of the most important means for navigation in the World Wide Web. 
However, the visualization of and the interaction with Web links have been scarcely 
explored, although Links have severe implications on the appearance and usability of 
Web pages and the World Wide Web as such.This paper presents two studies giving 
first insights of the effects of link visualization techniques on reading habits and 
performance. The first user study compares different highlighting techniques for 
link ... 

6 Scalable feature selection, classification and signature generation for 77% 
organizing large text databases into hierarchical topic taxonomies 

Soumen Chakrabarti , Byron Dom , Rakesh Agrawal , Prabhakar Raghavan 

The VLDB Journal — The International Journal on Very Large Data Bases August 

1998 

Volume 7 Issue 3 

We explore how to organize large text databases hierarchically by topic to aid better 
searching, browsing and filtering. Many corpora, such as internet directories, digital 
libraries, and patent databases are manually organized into topic hierarchies, also 
called taxonomies. Similar to indices for relational data, taxonomies make search and 
access more efficient. However, the exponential growth in the volume of on-line 
textual information makes it nearly impossible to maintain such taxono ... 

7 Novel architectures: A pipelined configurable gate array for embedded 77% 
U processors 

Andrea Lodi , Mario Toma , Fabio Campi 

Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field 

pr grammable gate arrays February 2003 

In recent years the challenge of high performance, low power retargettable 
embedded system has been faced with different technological and architectural 
solutions. In this paper we present a new configurable unit explicitly designed to 
implement additional reconfigurable pipelined datapaths, suitable for the design of 
reconfigurable processors. A VLIW reconfigurable processor has been implemented on 
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silicon in a standard 0.18 p m CMOS technology to prove the effectiveness of the 
prop ... 

8 Types and persistence in database programming languages 770/0 
Cft Malcolm P. Atkinson , O. Peter Buneman 

LJ ACM Computing Surveys (CSUR) June 1987 
Volume 19 Issue 2 

Traditionally, the interface between a programming language and a database has 
either been through a set of relatively low-level subroutine calls, or it has required 
some form of embedding of one language in another. Recently, the necessity of 
integrating database and programming language techniques has received some long- 
overdue recognition. In response, a number of attempts have been made to construct 
programming languages with completely integrated database management systems. 
These lang ... 

9 The Hearsay-!! Speech-Understanding System: Integrating Knowledge 770/0 
to Resolve Uncertainty 

Lee D. Erman , Frederick Hayes-Roth , Victor R. Lesser , D. Raj Reddy 
ACM Computing Surveys (CSUR) June 1980 
Volume 12 Issue 2 

10 Office Information Systems and Computer Science 770/0 

Clarence A. Ellis , Gary J. Nutt 
ACM Computing Surveys (CSUR) January 1980 
Volume 12 Issue 1 

11 Data clustering; a review 770/0 

A. K. Jain , M. N. Murty , P. J. Flynn 
ACM Computing Surveys (CSUR) September 1999 
Volume 31 Issue 3 

Clustering is the unsupervised classification of patterns (observations, data items, or 
feature vectors) into groups (clusters). The clustering problem has been addressed in 
many contexts and by researchers in many disciplines; this reflects its broad appeal 
and usefulness as one of the steps in exploratory data analysis. However, clustering 
is a difficult problem combinatorially, and differences in assumptions and contexts in 
different communities has made the transfer of useful generic co ... 

12 A study of the applicability of existing exception-handling techniques to 770/0 
component-based real-time software technology 

Jun Lang , David B. Stewart 

ACM Transactions on Programming Languages and Systems (TOPLAS) March 1998 
Volume 20 Issue 2 

This study focuses on the current state of error-handling technology and concludes 
with recommendations for further research in error handling for component-based 
real-time software. With real-time programs growing in size and complexity, the 
quality and cost of developing and maintaining them are still deep concerns to 
embedded software industries. Component-based software is a promising approach in 
reducing development cost while increasing quality and reliability. As with any other 
real- ... 
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13 Multimedia backroads (panel): low bandwidth implementations 77% 

John M. Danskin , Andres Albanese , Geoffrey M. Davis , Paul G. Jensen 
— i Pr ceedings f the third ACM internati nal conference n Multimedia January 
1995 



14 Extending a graphics! toolkit for two-handed interaction 

Stephane Chatty 

Proceedings of the 7th annual ACM symposium on User interface software and 
technology November 1994 

Multimodal interaction combines input from multiple sensors such as pointing devices 
or speech recognition systems, in order to achieve more fluid and natural interaction. 
Two-handed interaction has been used recently to enrich graphical interaction. 
Building applications that use such combined interaction requires new software 
techniques and frameworks. Using additional devices means that user interface 
toolkits must be more flexible with regard to input devices and event types. The 
possib ... 
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Poster papers; Topic-conditioned noveity detection 
Yiming Yang , Jian Zhang , Jaime Carbonell , Chun Jin 

Proceedings of the eighth ACM SIGKDD international conference on Knowledge 

discovery and data mining July 2002 

Automated detection of the first document reporting each new event in temporally- 
sequenced streams of documents is an open challenge. In this paper we propose a 
new approach which addresses this problem in two stages: 1) using a supervised 
learning algorithm to classify the on-line document stream into pre-defined broad 
topic categories, and 2) performing topic-conditioned novelty detection for documents 
in each topic. We also focus on exploiting named-entities for event-level novelty 
detection ... 



80% 



2 Poster papers; Incremental context: mining for adaptive document 
Q classification 

Rey-Long Liu , Yun-Ling Lu 

Proceedings of the eighth ACM SIGKDD international conference on Knowledge 

discovery and data mining July 2002 

Automatic document classification (DC) is essential for the management of 
information and knowledge. This paper explores two practical issues in DC: (1) each 
document has its context of discussion, and (2) both the content and vocabulary of 
the document database is intrinsically evolving. The issues call for adaptive document 
classification (ADC) that adapts a DC system to the evolving contextual requirement 
of each document category, so that input documents may be classifie ... 



3 Text classification: A refinement approach to handling model misfit in 
text categorization 

Haoran Wu , Tong Heng Phang , Bing Liu , Xiaoli Li 

Proceedings of the eighth ACM SIGKDD internati nal c nference n Kn wledge 
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disc very and data mining July 2002 

Text categorization or classification is the automated assigning of text documents to 
pre-defined classes based on their contents. This problem has been studied in 
information retrieval, machine learning and data mining. So far, many effective 
techniques have been proposed. However, most techniques are based on some 
underlying models and/or assumptions. When the data fits the model well, the 
classification accuracy will be high. However, when the data does not fit the model 
well, the classificat ... 



4 Classification: Boosting to correct inductive bias in text classification 
Oj Yan Liu , Yiming Yang , Jaime Carbonell 

— 3 Proceedings of the eleventh international conference on Information and 
knowledge management November 2002 

This paper studies the effects of boosting in the context of different classification 
methods for text categorization, including Decision Trees, Naive Bayes, Support 
Vector Machines (SVMs) and a Rocchio-style classifier. We identify the inductive 
biases of each classifier and explore how boosting, as an error-driven resampling 
mechanism, reacts to those biases. Our experiments on the Reuters-21578 
benchmark show that boosting is not effective in improving the performance of the 
base classifiers ... 



5 Machine learning in automated text categorization 
Eft Fabrizio Sebastiani 

L - 1 ACM Computing Surveys (CSUR) March 2002 
Volume 34 Issue 1 

The automated categorization (or classification) of texts into predefined categories 
has witnessed a booming interest in the last 10 years, due to the increased 
availability of documents in digital form and the ensuing need to organize them. In 
the research community the dominant approach to this problem is based on machine 
learning techniques: a general inductive process automatically builds a classifier by 
learning, from a set of preclassified documents, the characteristics of the categories. 



6 Maximum likelihood estimation for filtering thresholds 

Yi Zhang , Jamie Callan 

Proceedings of the 24th annual international ACM SIGIR conference on Research 
and development in information retrieval September 2001 

Information filtering systems based on statistical retrieval models usually compute a 
numeric score indicating how well each document matches each profile. Documents 
with scores above profile-specificdissemination thresholdsare delivered. An optimal 
dissemination threshold is one that maximizes a given utility function based on the 
distributions of the scores of relevant and non-relevant documents. The parameters 
of the distribution can be estimated using releva ... 



7 The score-distributional threshold optimization for adaptive binary 
classification tasks 

Avi Arampatzis , Andre van Hameran 

Pr ceedings f the 24th annual internati nal ACM SIGIR conference on Research 

and devel pment in inf rmati n retrieval September 2001 

The thresholding of document scores has proved critical for the effectiveness of 
classification tasks. We review the most important approaches to thresholding, and 
introduce thescore-distributional (S-D) threshold optimizationmethod. The method is 
based on score distributions and is capable of optimizing any effectiveness measure 
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defined in terms of the traditional contingency table.As a byproduct, we provide a 
model forscore distributions, and d ... 



8 An information -theoretic approach to automatic query expansion 
Claudio Carpineto , Renato de Mori , Giovanni Romano , Brigitte Bigi 
ACM Transactions on Information Systems (TOIS) January 2001 
Volume 19 Issue 1 

Techniques for automatic query expansion from top retrieved documents have shown 
promise for improving retrieval effectiveness on large collections; however, they often 
rely on an empirical ground, and there is a shortage of cross-system comparisons. 
Using ideas from Information Theory, we present a computationally simple and 
theoretically justified method for assigning scores to candidate expansion terms. Such 
scores are used to select and weight expansion terms within Rocchio's framework ... 



9 Boosting for document routing 

Raj D. Iyer , David D. Lewis , Robert E. Schapire , Yoram Singer , Amit Singhal 
Proceedings of the ninth international conference on Information and knowledge 
management November 2000 



10 Text filtering by boosting naive Bayes classifiers 

Yu-Hwan Kim , Shang-Yoon Hahn , Byoung-Tak Zhang 
L — j Proceedings of the 23rd annual international ACM SIGIR conference on Research 

and development in information retrieval July 2000 



Several machine learning algorithms have recently been used for text categorization 
and filtering. In particular, boosting methods such as AdaBoost have shown good 
performance applied to real text data. However, most of existing boosting algorithms 
are based on classifiers that use binary-valued features. Thus, they do not fully make 
use of the weight information provided by standard term weighting methods. In this 
paper, we present a boosting-based learning method for text filtering that use ... 



11 Improving text categorization methods for event tracking 
Yiming Yang , Tom Ault , Thomas Pierce , Charles W. Lattimer 

Proceedings of the 23rd annual international ACM SIGIR conference on Research 
and development in information retrieval July 2000 



Automated tracking of events from chronologically ordered document streams is a 
new challenge for statistical text classification. Existing learning techniques must be 
adapted or improved in order to effectively handle difficult situations where the 
number of positive training instances per event is extremely small, the majority of 
training documents are unlabelled, and most of the events have a short duration in 
time. We adapted several supervised text categorization methods, specifically se ... 



12 Relevance feedback with a smali number of relevance judgements: 80% 
incremental relevance feedback v$, document clustering 

Makoto Iwayama 

Pr ceedings of the 23rd annual internati nal ACM SIGIR c nference n Research 
and devel pment in information retrieval July 2000 

The use of incremental relevance feedback and document clustering were 
investigated in an relevance feedback environment in which the number of relevance 
judgements was quite small. Through experiments on the TREC collection, the 
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incremental relevance feedback approach was found not to improve the overall search 
effectiveness. The clustering approach was found to be promising, although it 
sometimes over-focuses on a particular topic in a query and ignores the others. To 
overcome this problem, ... 



13 Text classification using BSObased stochastic decision lists 
Hang Li , Kenji Yamanishi 

— Proceedings of the eighth international conference on Information and 
knowledge management November 1999 

We propose a new method of text classification using stochastic decision lists. A 
stochastic decision list is an ordered sequence of IF-THEN rules, and our method can 
be viewed as a rule-based method for text classification having advantages of 
readability and refinability of acquired knowledge. Our method is unique in that 
decision lists are automatically constructed on the basis of the principle of minimizing 
Extended Stochastic Complexity (ESC), and with it we are able to construct decis ... 



14 An Internet-based newspaper filtering and personalization system 
Q (demonstration abstract) 

Aleksander Kotcz , Joshua Alspector 

Proceedings of the 22nd annual international ACM SIGIR conference on Research 
and development in information retrieval August 1999 



15 A hidden Markov model information retrieval system 

David R. H. Miller , Tim Leek , Richard M. Schwartz 

Proceedings of the 22nd annual international ACM SIGIR conference on Research 
and development in information retrieval August 1999 



16 Context-sensitive learning methods for text categorization 

□fo William W. Cohen , Yoram Singer 

^ ACM Transactions on Information Systems (TOIS) April 1999 
Volume 17 Issue 2 

Two recently implemented machine-learning algorithms, RIPPERand sleeping-experts 
for phrases, are evaluated on a number of large text categorization problems. These 
algorithms both construct classifiers that allow the "context" of a word w to affect 
how (or even whether) the presence or absence of w will contribute to a classification. 
However, RIPPER and sleeping-experts differ radically in many other respects: ... 



17 Boosting and Rocchio applied to text filtering 

Robert E. Schapire , Yoram Singer , Amit Singhal 

Proceedings of the 21st annual international ACM SIGIR conference on Research 
and development in information retrieval August 1998 




18 Information access and retrieval; Expanding domain-specific lexicons by 77% 
|^ term categorization 

Henri Avancini , Alberto Lavelli , Bernardo Magnini , Fabrizio Sebastiani , Roberto Zanoli 
Pr ceedings f the 2003 ACM symp sium n Applied c mputing March 2003 
We discuss an approach to the automatic expansion of domain-specific lexicons by 
means of term categorization, a novel task employing techniques from information 
retrieval (IR) and machine learning (ML). Specifically, we view the expansion of such 
lexicons as a process of learning previously unknown associations between terms and 
domains. The process generates, for each ci in a set C = {cl,..., cm} of domains, a 
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19 Minimum majority classification and boosting 

Philip M. Long 

Eighteenth nati nal c nference n Artificial intelligence July 2002 

Motivated by a theoretical analysis of the generalization of boosting, we examine 
learning algorithms that work by trying to fit data using a simple majority vote over a 
small number of a collection of hypotheses. We provide experimental evidence that 
an algorithm based on this principle outputs hypotheses that often generalize nearly 
as well as those output by boosting, and sometimes better. We also provide 
experimental evidence for an additional reason that boosting algorithms generalize 
well, ... 



20 Special issue on Machine learning methods for text and images: A family 77% 
Q of addiUve online algorithms for category ranking 
Koby Crammer , Yoram Singer 

The Journal of Machine Learning Research March 2003 
Volume 3 

We describe a new family of topic-ranking algorithms for multi-labeled documents. 
The motivation for the algorithms stem from recent advances in online learning 
algorithms. The algorithms are simple to implement and are also time and memory 
efficient. We provide a unified analysis of the family of algorithms in the mistake 
bound model. We then discuss experiments with the proposed family of topic-ranking 
algorithms on the Reuters-21578 corpus and the new corpus released by Reuters in 
2000. On bo ... 
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— Proceedings of the fifth ACM SIGKDD international conference on Knowledge 
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1 On-line learning and the metrical task system problem 82% 

Cft Avrim Blum , Carl Burch 

— 3 Proceedings of the tenth annual conference on Computational learning theory July 
1997 



2 Static optimality and dynamic search-optimality in lists and trees 

Avrim Blum , Shuchi Chawla , Adam Kalai 

Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete 
algorithms January 2002 

Adaptive data structures form a central topic of on-line algorithms research, beginning 
with the results of Sleator and Tarjan showing that splay trees achieve static 
optimality for search trees, and that Move-to-Front is constant competitive for the list 
update problem [ST85a, ST85b]. This paper is inspired by the observation that one 
can in fact achieve a 1 + e ratio against the best static object in hindsight for a wide 
range of data structure problems via "weighted experts" te ... 



3 Tracking the best linear predictor 
□j Mark Herbster , Manfred K. Warmuth 

— 1 The Journal of Machine Learning Research September 2001 
Volume 1 

In most on-line learning research the total on-line loss of the algorithm is compared to 
the total loss of the best off-line predictor u from a comparison class of predictors. We 
call such bounds static bounds. The interesting feature of these bounds is that they 
hold for an arbitrary sequence of examples. Recently some work has been done where 
the predictor u t at each trial t is allowed to change with time, and the total on-line 

loss of the algorithm is co ... 



4 Online algorithms for market clearing 

Avrim Blum , Tuomas Sandholm , Martin Zinkevich 

Pr ceedings of the thirteenth annual ACM-SIAM symposium on Discrete 
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alg rithms January 2002 

In this paper we study the problem of online market clearing where there is one 
commodity in the market, being bought and sold by multiple buyers and sellers who 
submit buy and sell bids that arrive and expire at different times. The auctioneer is 
faced with an online clearing problem of deciding which buy and sell bids to match 
without knowing what bids will arrive in the future. For maximizing surplus, we 
present a (randomized) online algorithm with competitive ratio \n(pmax 



5 Better algorithms for unfair metrical task systems and applications 

Amos Fiat , Manor Mendel 

Proceedings of the thirty-second annual ACM symposium on Theory of computing 

May 1999 



6 Tracking the best regressor 
Mark Herbster , Manfred K. Warmuth 

Proceedings of the eleventh annual conference on Computational learning theory 

July 1998 
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1 Worst-case quadratic loss bounds for a generalization of the Widrow- 80% 
[j] Hoff rule 

Nicolo Cesa Bianchi , Philip M. Long , Manfred K. Warmuth 

Proceedings of the sixth annual conference on Computational learning theory 

August 1993 



2 How to use expert advice 

Nicolo Cesa-Bianchi , Yoav Freund , David P. Helmbold , David Haussler , Robert E. 
Schapire , Manfred K. Warmuth 

Proceedings of the twenty-fifth annual ACM symposium on Theory of computing 

June 1993 



3 Universal sequential learning and decision from individual data 
2) sequences 

Neri Merhav , Meir Feder 

Proceedings of the fifth annual workshop on Computational learning theory July 
1992 
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